Embedding the Ulam metric into ` 1

نویسندگان

  • Moses Charikar
  • Robert Krauthgamer
  • M. CHARIKAR
  • R. KRAUTHGAMER
چکیده

Edit distance is a fundamental measure of distance between strings, the extensive study of which has recently focused on computational problems such as nearest neighbor search, sketching and fast approximation. A very powerful paradigm is to map the metric space induced by the edit distance into a normed space (e. g., `1) with small distortion, and then use the rich algorithmic toolkit known for normed spaces. Although the minimum distortion required to embed edit distance into `1 has received a lot of attention lately, there is a large gap between known upper and lower bounds. We make progress on this question by considering large, well-structured submetrics of the edit distance metric space. Our main technical result is that the Ulam metric, namely, the edit distance on permutations of length at most n, embeds into `1 with distortion O(logn). This immediately leads to sketching algorithms with constant size sketches, and to efficient approximate nearest neighbor search algorithms, with approximation factor O(logn). The embedding and its algorithmic consequences present a big improvement over those previously known for the Ulam metric, and they are significantly better than the state of the art for edit distance in general. Further, we extend these results for the Ulam metric to edit distance on strings that are (locally) non-repetitive, i. e., strings where (close by) substrings are distinct. ACM Classification: F.2.2, G.2.1, G.3 AMS Classification: 68P05, 68W20, 68W25

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Embedding the Ulam metric into l1

Edit distance is a fundamental measure of distance between strings, the extensive study of which has recently focused on computational problems such as nearest neighbor search, sketching and fast approximation. A very powerful paradigm is to map the metric space induced by the edit distance into a normed space (e. g., `1) with small distortion, and then use the rich algorithmic toolkit known fo...

متن کامل

Overcoming the `1 Non-Embeddability Barrier: Algorithms for Product Metrics

A common approach for solving computational problems over a difficult metric space is to embed the “hard” metric into L1, which admits efficient algorithms and is thus considered an “easy” metric. This approach has proved successful or partially successful for important spaces such as the edit distance, but it also has inherent limitations: it is provably impossible to go below certain approxim...

متن کامل

Overcoming the l1 non-embeddability barrier: algorithms for product metrics

A common approach for solving computational problems over a difficult metric space is to embed the “hard” metric into L1, which admits efficient algorithms and is thus considered an “easy” metric. This approach has proved successful or partially successful for important spaces such as the edit distance, but it also has inherent limitations: it is provably impossible to go below certain approxim...

متن کامل

On Ulam-von Neumann Transformations

We define and study Ulam-von Neumann transformations which are certain interval mappings and conjugate to q(x) = 1 — 2x on [—1,1]. We use a singular metric on [—1,1] to study a Ulam-von Neumann transformation. This singular metric is universal in the sense that it does not depend on any particular mapping but only on the exponent of this mapping at its unique critical point. We give the smooth ...

متن کامل

The Computational Hardness of Estimating Edit Distance

We prove the first nontrivial communication complexity lower bound for the problem of estimating the edit distance (aka Levenshtein distance) between two strings. To the best of our knowledge, this is the first computational setting in which the complexity of estimating the edit distance is provably larger than that of Hamming distance. Our lower bound exhibits a trade-off between approximation...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2006